SCIENTIFIC 

REPORTS 




OPEN 



SUBJECT AREAS: 
MICROBIOME 
RESPIRATORY TRACT DISEASES 

Received 
20 January 2014 

Accepted 
24 February 201 4 

Published 
11 March 2014 



Correspondence and 
requests for materials 
should be addressed to 
JJ.L. (jlipuma@umich. 

edu] 



Modeling the Impact of Antibiotic 
Exposure on Human Microbiota 



Jiangchao Zhao' , Susan Murray 2 & John J. LiPuma' 



' Department of Pediatrics and Communicable Diseases, University of Michigan Medical School, Ann Arbor, Michigan, USA, 
2 Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, USA, 3 Department of 
Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan, USA. 

Human-associated microbial communities play important roles in health and disease. Antibiotic 
administration is arguably one of the most important modifiable determinants of the composition of the 
human microbiota. However, quantitatively modeling antibiotic use to account for its impact on microbial 
community dynamics presents a challenge. We used antibiotic therapy of chronic lung infection in persons 
with cystic fibrosis as a model system to assess the influence of key variables of therapy on measures of 
microbial community perturbation. We constructed multivariate linear mixed models with bacterial 
community diversity as the outcome measure and various scales of antibiotic weighting as predictors, while 
controlling for other variables. Antibiotic weighting consisted of three components: (i) dosing duration; (ii) 
timing of administration relative to sample collection; and (iii) antibiotic type and route of administration. 
Antibiotic weighting based on total dose and proximity to the time of sampling was most predictive of 
bacterial community change. Using this model to control for antibiotic use enabled the identification of 
other significant independent predictors of microbial community diversity such as dominant taxon, disease 
stage, and gender. Quantitative modeling of antibiotic use is critical in understanding the relationships 
between human microbiota and disease treatment and progression. 



The advent of next generation sequencing has enabled culture-independent profiling of complex human- 
associated bacterial communities (microbiota) in unprecedented detail. Application of this approach has 
dramatically expanded our understanding of the diversity of microbiota associated with various human 
body habitats 1 , different human populations 2 , and periods of health and illness 3 7 . Metagenomic, metabolomic, 
and metatranscriptomic analyses have now begun to investigate the functional attributes of these microbial 
communities to better understand the role human microbiota play in health maintenance, predisposition to 
and pathogenesis of disease, and the response to therapy 813 . Such studies view the human body as an ecosystem, 
with human health being dependent in part upon the services provided by the host-associated microbiota. In this 
regard, the application of ecological theory to study human microbiota is gaining increasing attention 14 . Microbial 
community assembly theory, in particular, is finding application in efforts to understand the processes that shape 
diversity in local assemblages during periods of ill health, treatment and recovery. 

A key element of community assembly theory is an appreciation of the effects of community disturbance on 
diversity and the recovery of communities after perturbation. With respect to the human ecosystem, antibiotic 
therapy serves as a paradigm for disturbance of host-associated communities; in fact, antibiotic administration 
may be considered the most important and common form of disturbance of the human microbiota 14-19 . In studies 
of human gut microbiota, for example, the effect of antibiotic administration on diversity is far greater than the 
routine temporal variability in community composition 1619,20 . 

Despite the prominent role that antibiotic therapy plays in effecting changes in microbial community com- 
position and reassembly of local communities, robust quantitative models of antibiotic use are lacking. This limits 
incorporating measures of antibiotic use in studies to assess the relationships between antibiotic-driven com- 
munity perturbation, movement, and reassembly. As important, this presents a challenge to studies wherein 
antibiotic use must be controlled as one of a number of variables that impact community diversity and disease 
progression. This is especially relevant in studies of chronic infectious diseases characterized by recurrent, 
intensive antibiotic administration. 

We used treatment of persons with cystic fibrosis (CF), a condition characterized by persistent bacterial 
infection of the airways managed with chronic maintenance antibiotic therapy as well as intensive episodic 
antibiotic treatment, to develop a quantitative model of antibiotic use. We tested the effect of antibiotic admin- 
istration on airway bacterial community diversity (our outcome measure of interest) by considering the duration 
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Table 2 Characteristics of samples in the training and validation 


data sets 








Training Set 


vaiiaarion oet 


V/i r i n ri 1 o c 


(n = 1 1 6) 


(n = 362) 


Age in years', Count (%) 






<17 


0(0) 


41 (11) 


17-26 


77 (66) 


1 65 (46) 


27-37 


39 (34) 


91 (25) 


>37 


0(0) 


65 (18) 


Disease Stage 2 , Count (%) 




Early 


44 (38) 


65 (18) 


Intermediate 


31 (27) 


1 76 (49) 


Late 


41 (35) 


1 / 1 (Jo) 


Dominant OTUs 3 , Count (%) 






Pseuaomonas 


95 (82) 


205 (57) 


Burkholderia 


0(0) 


43 (12) 


Streptococcus 


8(7) 


52(14) 


Others 


13(11) 


62(17) 


'Age of patient when sample was obtained. 




2 Specimens were assigned to one of three 


disease stage categories, def 


ned by per cent predicted 


forced expiratory volume in one second (%FEVi) values at the time of sample collection: early 


(%FEV, > 70), intermediate (70 a %FEV 


l > 40), or advanced [%FEVi 


<40). 


3 OTUs: operational taxonomic units; The 


dominant OTU was defined as the most abundant OTU 


detected in the sample. 







Table 1 | Characteristics of patients in the training and validation 
data sets 



\/ "LI 

Variables 


Training Set 
(n = 6) 


Validation Set 
(n = 60) 


i 

Samples per patient, 


on MO QMl 


A fO OT1 


Mean (Range) 






Gender, Count (%) 






Male 


O [ 1 UUJ 


TO 


Female 


0(0) 


28 (47) 


Disease Severity 1 , Count (%) 






Mild 


3 (50) 


13 (22) 


Moderate 


3 (50) 


1 8 (30) 


Severe 


0(0) 


29 (48) 


CFTR Genotype, Count (%) 






<5 F508 homozygous 


4(67) 


24 (39) 


S F508 heterozygous 


1 (17) 


29 (49) 


Others 


1 (17) 


7(12) 



'Patients were assigned to one of three disease severity categories' 



and timing of administration relative to the day of sampling, as well 
as antibiotic class and route of administration. A training data set, 
composed of 116 sputum samples that had been extensively charac- 
terized with respect to microbial community profiles, antibiotic 
exposure, and metrics of patient health, was used to develop a test 
model 21 . This model was then validated with a larger data set con- 
sisting of 362 similarly characterized respiratory samples. We show 
how this model may be adapted to other studies of the relationship 
between microbial community dynamics, antibiotic use and disease 
progression. 

Results 

Characterization of patients and sputum samples. The patients 
and sputum samples included in the training and validation data 
sets were characterized with respect to two types of variables 
(Table 1 and 2). Fixed, patient-specific variables included gender, 
CFTR genotype, and disease severity (or aggressiveness) pheno- 
type 21-22 . Time-dependent, sample-specific variables included 
patient age, lung function, and disease stage 21 at the time of sample 
collection. Lung function was measured as percent predicted forced 
expiratory volume in one sec (%FEV 1 ). Disease stage was denned as 
early when serial %FEV! measures were >70; intermediate when 
%FEV! measures were between 70 and 40; and advanced when 
%FEV! measures were <40. The dominant operational taxonomic 
unit (OTU) detected in each sample by deep-sequencing was also 
included as a sample-specific variable. This was defined as the most 
abundant OTU detected in the sample. Bacterial community 
diversity of each sputum sample was measured by calculating the 
inverse Simpson index, which takes into account both the number of 
OTUs (richness) present in the sample and their relative abundance 
(evenness). 

The training set samples (n = 116) were from six men previously 
described by us in detail 21 . All six patients had a mild or moderate 
disease severity phenotype. Samples were collected when these 
patients were between 18 and 30 years of age. The samples were 
roughly evenly distributed among periods when these patients were 
in early, intermediate, or late stages of lung disease. The dominant 
OTU in most (82%) samples represented the genus Pseudomonas. 
The patients (n = 60) and sputum samples (n = 362) in the valid- 
ation set were more heterogeneous. Men represented 53% of patients 
and there was a greater distribution of patients with mild, moderate 
and severe disease severity phenotypes. The validation set samples 
were also more diverse with respect to the dominant OTU detected. 
Of note, none of the patients in either the training set or the valid- 
ation set were smokers. 



Antibiotic weighting score development. The antibiotic exposure 
associated with each sample was measured by assessing the antibiotic 
administration to the source patient during a 30-day window prior to 
sample collection. The duration of exposure (no. of days receiving the 
antibiotic), the timing of administration relative to the day of 
sampling (e.g., 20 days vs 2 days prior to sampling), and the 
antibiotic class and route of administration were determined for 
each sample. 

These variables were used to develop antibiotic weighting compo- 
nents that yielded scores used as covariates in models predicting 
bacterial community diversity. 

Weight component A. (wcA; Equation 1 in Materials and Methods, 
and Fig. SI) accounts for the duration of antibiotic use during this 30 
day window by assessing the number of days an antibiotic was (wcA 
= 1) or was not (wcA = 0) administered (Fig. 1A). wcA for each 
antibiotic was determined from observed sample level data without 
subjective assessment and was constructed similarly for all samples 
in both the training and validation data sets. 

Weight component B. (wcB; Equation 2 in Materials and Methods) 
accounts for the proximity of antibiotic use relative to the sampling 
day. Four weighting schemes, described by the formulas in Equation 
2, were assessed, including (i) an equal weight for each day irrespect- 
ive of proximity to the sampling date, (ii) a linear increase in weights 
with increasing proximity to the sampling date, and either (iii) a 
concave or (iv) a convex increase in weights with increasing prox- 
imity to the sampling date (Fig. IB and Table SI). A score for each 
antibiotic administered during the 30 days prior to the date of each 
sputum sample was calculated as a product of wcA and wcB 
(Equation 3 in Materials and Methods). The sum of the scores for 
all antibiotics administered in association with a sputum sample was 
calculated to provide the total antibiotic exposure for each sample 
(Equation 4 in Materials and Methods). 

Using the total antibiotic exposure for each sputum sample, the 
training dataset was analyzed to determine which of the four wcB 
weighting schemes best predicted the inverse Simpson index, which 
had been previously calculated for each sample, based on the Akaike 
Information Criterion (AIC), after adjusting for age and %FEVi at 
the sampling time. A comparison of AICs indicated that the convex 
increasing weighting scheme provided the best prediction for the 
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Figure 1 | Antibiotic weight components (A) (wcA) and B (wcB). Panel (A) depicts daily wcA values for patient P2 during the 30 days prior to 
collection of sample 27. This patient received four antibiotics during this time (tobramycin-IV, meropenem-IV, ciprofloxacin-PO, and doxycycline-PO). 
A value of 1 indicates antibiotic administration on that day, while 0 indicates no antibiotic administration. Panel (B) depicts wcB profiles during the 30 
days prior to sampling. These profiles indicate equal weighting (black) as well as linear (red), concave (blue), and convex (green) increasing weights as 
days approach the sampling time. The data points for each profile (circles) were drawn based on values calculated by Equation 2 in the text and each value 
was listed in Table SI. 



inverse Simpson index (Table S2). When the larger and indepen- 
dently sampled validation sample set was similarly analyzed, using 
the same four wcB weighting schemes, the convex increasing weigh- 
ing scheme again provided the best prediction of the inverse Simpson 
index. Analyses on 500 bootstrap samples from the validation set 
demonstrated the stability of this wcB weighting scheme. The wcB 
convex increasing weights were ranked as the best fit in 75% of the 
bootstrap samples, outperforming the other choices, and were there- 
fore used in the remainder of the study whenever wcB was 
considered. 

Weight component C. (wcC; Equation 5 in Materials and Methods) 
accounts for the effects of antibiotic type and route of administration 
on predicting the inverse Simpson index. Since only 16 of the 37 
antibiotic types observed in the validation set were used in the train- 
ing set, we based our initial evaluation of this weight component on 
analyses of the combined training and validation sets. First, the abil- 
ity of each antibiotic associated with a sputum sample (i.e., adminis- 
tered within 30 days of sample collection) to predict the inverse 
Simpson index was assessed, based on the wcA and wcB weighting 
components and adjusting for age and %FEV 1 at sampling. No sig- 
nificant interactions between multiple antibiotics were detected. 
Next, the AICs for the 37 antibiotic types were modeled and ranked 
from lowest (best prediction model) to highest (worst prediction 
model). Antibiotic type coefficients from the 37 AIC models were 
also ranked based on largest to smallest impact on inverse Simpson 
index (Table S3). 



The AIC and coefficient ranks were summed and sorted from 
lowest to highest and grouped into terciles. wcC values of 0.5, 0.33 
or 0.17 were assigned to each combination of antibiotic type and 
route of administration for the best, intermediate, and worst predic- 
tors of the inverse Simpson index, respectively (Equation 5 in 
Materials and Methods and Table S3). Antibiotics administered by 
the IV route were more likely to be in the top tercile (i.e., wcC value of 
0.5) than were antibiotics administered orally or by inhalation (50% 
of IV administered antibiotics had wcC of 0.5 compared to 16% of 
oral/inhaled antibiotics; Fisher's exact test p = 0.04). 

The combined antibiotic weighting score (Equation 6 in Materials 
and Methods) for each sputum sample was calculated by multiplying 
wcA, wcB, and wcC for each of the 30 days prior to sputum collection 
and then summing these scores across the 30 days. Inclusion of wcC 
improved the prediction of community diversity (inverse Simpson 
index) in each of the training, validation and combined sets as 
opposed to using only wcA and wcB alone. A permutation test indi- 
cated that inclusion of wcC in the combined antibiotic weighting 
score gave a significantly lower AIC value (better model fit; p < 
0.001) than would have occurred under 5000 random permutations 
of the wcC values across the 37 antibiotic types. The distribution of 
the combined antibiotic load score across validation samples is 
shown in Figure S2. 

Predictors of community diversity in CF. To illustrate the utility of 
the antibiotic weighting scoring schemes, we included the combined 
antibiotic weighting score as a covariate in a multivariate model 
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predicting community diversities in our collection of CF sputum 
samples. This allowed us to explore associations between bacterial 
community diversity (inverse Simpson index) and the patient- and 
sample-specific variables associated with these samples. Since disease 
stage was defined based on lung function C^FEVj), these correlated 
variables could not both be included in the model. We therefore 
included disease stage as a covariate in the model since it has a 
stronger predictive ability (AIC 1572.14) than lung function (AIC 
1579.56). 

Table 3 shows results from the multivariate linear mixed model 
predicting the inverse Simpson index in the validation set. An 
increase of one unit in the antibiotic weighting score calculated over 
the month prior to sampling is associated with a 1.25 point decrease 
in the inverse Simpson index on average (95% CI, 0.44, 2.05 point 
decrease, p = 0.002) after adjusting for dominant OTU, disease 
severity, gender, CFTR genotype, disease stage, and patient age. 
We observed significant differences in community diversity with 
respect to dominant OTU (composite P < 0.001). Samples domi- 
nated by Pseudomonas or Burkholderia showed the least diversity, 
while samples dominated by Streptococcus showed the greatest 
diversity (approximately 3 inverse Simpson index points higher than 
Pseudomonas and Burkholderia, on average) after adjusting for age, 
gender, disease severity, disease stage, CFTR genotype, and antibiotic 
weighting in the previous month. On average, men had an approxi- 
mately 1 point higher inverse Simpson index than women after 
adjusting for other factors (p = 0.004). We observed decreasing 
diversity with advancing disease stage after adjusting for other fac- 
tors; on average, early disease stage had a 1.15 point higher inverse 
Simpson index than late disease stage (p = 0.02). We did not observe 
significant associations between community diversity and disease 
severity or CFTR genotype after adjusting for age, gender, dominant 
OTU, disease stage, and antibiotic use in the previous month. In 
addition, we did not observe a significant decrease in diversity with 
increasing age after adjusting for other factors. 

Figure 2 shows the community diversity of a "standardized" CF 
patient predicted by specific variables after adjusting for other vari- 
ables. For example, after controlling for other variables, the estimated 
community diversity is 5.7 if the community is dominated by 
Streptococcus and 2.6 if dominated by Burkholderia. 



Discussion 

It is not unexpected that antibiotic administration perturbs bacterial 
communities in human hosts, thereby confounding efforts to better 
understand the microbial community signatures of disease progres- 
sion, particularly in chronic diseases where prolonged and repeated 
antibiotic use is required. Quantifying the impact of antibiotic ther- 
apy on complex bacterial communities is challenging, involving a 
myriad of variables such as dosing duration and timing relative to 
sample collection, and antibiotic type and route of administration. In 
this study, we propose a model to address this challenge by devel- 
oping antibiotic weighting scores that account for these variables. 
Using these scores, a combined antibiotic score can be calculated and 
included as a covariate in multivariate models that assess other fac- 
tors that may be important predictors of community structure. We 
chose community diversity - as measured by the inverse Simpson 
index - as the outcome parameter in this study. However, the prin- 
ciples and approach we describe could be applied to account for 
antibiotic use in analyses assessing predictors of any number of other 
microbiota community outcome measures. 

Weight component A takes into account the duration of antibiotic 
administration. In our model we limited this only to antibiotic 
administration in the 30 days prior to sample collection. This is an 
admittedly arbitrary interval that could be modified in the model as 
appropriate for other studies. Further, a defined interval such as this 
does not account for cumulative antibiotic load that may have 
accrued over very long intervals. Such use, involving potential 'legacy 
effects' of prior antibiotic courses, could have a significant impact on 
shaping human-associated microbiota. Although a comprehensive 
accounting of long-term prior antibiotic use is often limited by the 
unavailability of reliable medical records, the increasing use of 
detailed electronic medical records may enable analyses of this sort 
in future studies. 

In accounting for the timing of antibiotic administration relative 
to the sample date, (weight component B), antibiotic weighting based 
on a convex increasing weighting scheme provided a better predic- 
tion of community diversity than did the other schemes evaluated. 
This expected result likely reflects community resilience after anti- 
biotic perturbation; a longer interval between antibiotic administra- 
tion and sampling provides greater opportunity for the recovery of 



Table 3 | Multivariate linear mixed model including antibiotic use as a covariate 


Parameters 


Coefficient 


95% Confidence Interval 
Lower Bound Upper Bound 


Wald P-Value 


Composite P-Value 


Intercept 


5.73 


3.98 


7.48 


<0.001 


<0.001 


Dominant OTUs 










<0.001 


Pseudomonas 


-1.59 


-2.29 


-0.89 


<0.001 




Burkholderia 


-1.83 


-3.1 1 


-0.56 


0.005 




Streptococcus 


1.24 


0.40 


2.08 


0.004 




Others 


0.00 










Disease Severity 










0.256 


Mild 


0.64 


-0.80 


2.08 


0.38 




Moderate 


-0.41 


-1.40 


0.59 


0.42 




Severe 


0.00 










Gender 










0.004 


Male 


1.16 


0.39 


1.93 


0.004 




Female 


0.00 










CFTR (AF 508) 










0.1 19 


Homozygous 


-1.11 


-2.33 


0.1 1 


0.07 




Heterozygous 


-1.21 


-2.39 


-0.04 


0.04 




Other 


0.00 










Disease Stage 










0.067 


Early 


1.15 


0.18 


2.1 1 


0.02 




Intermediate 


0.46 


-0.25 


1.16 


0.21 




Late 


0.00 










Antibiotic Usage 


-1.25 


-2.05 


-0.44 


0.002 


0.002 


Age 


-0.04 


-0.09 


0.01 


0.1 17 


0.1 17 
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Figure 2 | Estimated community diversity by each predictor. Pseu: Pseudomonas, Burk: Burkholderia, Strep: Streptococcus. The predicted values by each 
predictor were calculated by controlling for other predictors based on the "standardized" CF patient profile: 57%, 12%, 14%, and 17% chance of being 
dominated by Pseudomonas, Burkholderia, Streptococcus or other bacteria, respectively; 18%, 49%, and 33% chance of being in early, intermediate or late 
disease stage, respectively; 22%, 30%, and 48% chance of having a mild, moderate or severe disease phenotype; 64% chance of being male; 44%, 44% and 
12% chance of being delta F508 homozygous, delta F508 heterozygous or another CFTR genotype, respectively; average age = 28.13 years and an 
antibiotic load = 0.11. 



members of the community affected by the antibiotic. Thus, an anti- 
biotic administered closer to the time of sampling would be expected 
to have a greater impact on the microbial community than would 
that same antibiotic administered at a longer interval from the sam- 
pling time. 

Specific antibiotic type is clearly a major component of any model 
designed to account for variables impacting the effect of antibiotic 
administration on human microbiota. We reasoned that route of 
antibiotic administration would also impact antibiotic effect and 
included this variable in our model as well. In developing this weight- 
ing component, we assessed the bacterial communities in 478 spu- 
tum samples from 66 persons with CF. Each combination of 
antibiotic type and route of administration used to treat these indi- 
viduals was provided a weight based on its ability to predict com- 
munity diversity, our outcome measure of interest. As such, this 
weighting was a function of the resistance of these communities to 
the perturbation caused by the combinations of antibiotic type and 
administration route included in our study. In studies of other 
human-associated microbial communities, other combinations of 
antibiotic type and route of administration would be expected to 
have different impacts. Assigning weights to these combinations 
therefore requires an analysis of a suitably sized data set relevant to 
the microbiome being investigated. 



We observed that, in general, antibiotic administration through an 
IV route had a significantly higher probability of effecting a greater 
change in community diversities, after controlling for subject age and 
%FEV 1 at the time of sampling, than did other routes of administra- 
tion. This may be due, in part, to the broad spectrum antimicrobial 
activities of agents such as cefepime and meropenem, which are 
administered exclusively via an IV route. A more complete analysis 
would need to take into account the antimicrobial susceptibilities of 
all species detected with deep-sequencing, including those that are 
not (or cannot be) routinely cultured in vitro. 

To demonstrate the utility of modeling antibiotic use in studies 
correlating the human microbiome and disease, we analyzed sets of 
CF sputum samples for which we had determined bacterial com- 
munity composition by deep sequencing. We included the antibiotic 
weighting score assigned to each sample as a covariate, allowing us to 
adjust for antibiotic use in a multivariate linear mixed model asses- 
sing other potential predictors of bacterial community diversity. In 
this analysis, we observed that community diversity was significantly 
associated with the taxonomic affiliation of the dominant OTU. 
Streptococcus dominated communities, for example, had signifi- 
cantly higher diversities than communities dominated by either 
Pseudomonas or Burkholderia, consistent with a recent study show- 
ing greater community diversity and higher relative abundance of 
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Streptococcus in airway communities among outpatients with CF 
compared to inpatients 23 . Our analysis also identified gender as a 
significant predictor of CF airway community diversity, with females 
having lower diversities than males. The adjustment for antibiotic 
use, as well as age, disease stage and the other variables included in 
the model suggests additional unidentified factors that reduce airway 
community diversity in women compared to men 24 . Disease severity 
in CF describes the rate of lung function decline relative to patient 
age 22 . In our analysis, after adjusting for antibiotic use and other 
variables, disease aggressiveness was not found to be significantly 
associated with community diversity, suggesting either that a larger 
sample size is needed to detect such an association or that other 
factors, most likely host-associated variables, play critical roles in 
CF lung disease progression. An important observation in our ana- 
lysis was that antibiotic use was found to be an independent predictor 
of decreased CF airway bacterial community diversity. The correla- 
tion between decreasing airway community diversity and lung dis- 
ease progression in CF has been noted in several recent studies 21 ' 25 " 27 ; 
however, the causal relationship between decreasing diversity and 
decreasing lung function has been the subject of controversy 28 . The 
results of our study support our previous observation suggesting that 
decreasing airway community diversity is driven primarily by anti- 
biotic therapy 21 . 

In summary, we have described an approach to account for anti- 
biotic exposure in studies examining the relationships between 
human microbiota and disease. More specifically, we propose a 
scheme, based on weighing variables associated with antibiotic use, 
to develop an antibiotic score that, in turn, can be included as a 
covariate in models exploring correlations between bacterial com- 
munities and human disease progression, particularly in conditions 
associated with repeated antibiotic therapy. We applied this scheme 
in a multivariate analysis of potential predictors of bacterial airway 
community diversity in a large cohort of persons with CF. Our find- 
ings sharpen previous observations of associations between decreased 
airway community diversity, lung disease progression, and variables 
such as gender and dominant community OTU. We show that anti- 
biotic therapy is an independent predictor of decreased airway com- 
munity diversity in CF. We expect that the specific weighting 
schemes we developed for our dataset will need to be modified to 
best suit studies of other disease conditions. Nevertheless, we pro- 
pose that the approach we describe will have broad applicability to 
such studies. 

Methods 

Patients and clinical samples. A total of 478 sputum samples, collected from 66 
adults with CF receiving care at the University of Michigan Health System, were 
included in this study. Sample collection and medical record review were approved by 
the University of Michigan Institutional Review Board, which waived the 
requirement for informed consent. Sputum specimens were collected during the 
course of routine medical care and stored at — 80°C in 0.5 mL aliquots as described 
previously 21 . A minority proportion of the samples was relegated for exploratory 
analysis. This training data set consisted of 1 16 sputum samples collected from six 
male CF patients during periods of 8 to 9 years as described previously 21 . The majority 
of sputum samples was used for validation analyses; this validation data set consisted 
of 362 sputum samples from 60 patients not included in the training data set. 

Sputum DNA extraction and pyrosequencing. DNA extraction and 
pyrosequencing were performed as described previously 2 '. Briefly, sputum aliquots 
were thawed on ice and homogenized with 0.5 mL Sputolysin® (EMD Chemicals, San 
Diego, CA) before DNA was purified by an automated nucleic acid purification 
platform (MagNA Pure Compact System, Roche, Indianapolis, IN) according to the 
manufacturer's protocol. The 16S rDNA V3-V5 region was amplified with bar-coded 
primers and sequenced by the Human Genome Sequencing Center at Baylor College 
of Medicine using protocols developed for the Human Microbiome Project (http:// 
www.hmpdacc.org/tools_protocols/tools_protocols.php) as described 17 . 

DNA sequence processing and analysis. Raw sequences were analyzed using mothur 
vl.24 29 . Reads were denoised by the PyroNoise component of the AmpliconNoise 
suite of programs 30 . Sequences containing homopolymers greater than 8 bp, 1 
mismatch in the barcode or 2 in the primer, one or more ambiguous bases were 
removed. Remaining sequences that were at least 200 bp but less than 590 bp in 



length were further curated to remove chimeric sequences using UCHIME 31 and to 
further reduce sequencing noise by a preclustering methodology 32 before being 
assigned to operational taxonomic units (OTUs) using an average neighbor algorithm 
with a 0.03 dissimilarity cutoff. The total number of reads for each community was 
first normalized to 568, the smallest number of reads among the 478 samples by 
random sampling, to control for differences in sequencing depth before calculation of 
community diversity. 

Statistical methods. The inverse Simpson index 33 was calculated and used as a 
measure of bacterial community diversity and was modeled via linear mixed models 
(IBM SPSS Statistics package version 20) so that associations between predictors and 
community measures would take into account autocorrelation of repeated samples 
from the same individual. The training data set was used to study the effects of the 
duration of antibiotic administration and the proximity of administration to the time 
of sample collection. Both the training and validation data sets were used to assess the 
effects of antibiotic type and route of administration on community measures. The 
Akaike Information Criterion (AIC) was used to compare the predictive value of 
antibiotic weighting definitions applied to the training and validation data in 
developing the final antibiotic weighting scheme; since sample size affects AIC values, 
these comparisons were only made within models of the same cohort. Using the 
validation data set, mixed model analyses studied multivariate associations between 
the final antibiotic weighting scheme and community measures, accounting for other 
clinical factors of interest. 

Antibiotic score calculations. Antibiotic weight component A (wcA) for antibiotic j, 
j — 1,...,37, is defined by: 



wcAu = 



l(if antibiotic j was used on day i) 
0(if antibiotic j was not used on day i) 



(i) 



for the / — 1, ...,30 days approaching the sampling time. 

Antibiotic weight component B (wcB) contenders shown in Fig. IB are defined by: 



wcBj — 



l°Sio(0 



Equal weight 
Linear increasing weight 

Concave increasing weight 

Convex increasing weight 



(2) 



for the / — 1, ...,30 days approaching the sampling time. 

The score for antibiotic j — 1,...,37, based on wcA and wcB becomes: 

^-,30 

Antibiotic j score = y j m ^ wcAy x wcB, 

The total antibiotic exposure calculated from treating all antibiotics as equally 
weighted is: 



E37 ^-r30 
;=I 2-^i = 



wcAu x wcB; 



(3) 



(4) 



Weight component C (wcC) was assigned for antibiotic type; — I,. ..,37, via: 

10.05 (if best diversity reduction tercile) 
0.33 (if middle diversity reduction tercile) (5) 
0.17 (if worst diversity reduction tercile) 

The final antibiotic exposure calculated from wcA, wcB and wcC is: 

WT - Y^- = i Y^i= i wcC i x wcAi i x wcB v W 
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